Voice based query interface for database: Stanford Parser

A natural language parser is a program that works out the grammatical structure of sentences, for instance, which groups of words go together (as “phrases”) and which words are the subject or object of a verb.

Dan Klein wrote the original version of this parser and Christopher Manning helped him by his support code and linguistic grammar development.

A Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads text in some language and assigns parts of speech to each word (and other token), such as noun, verb, adjective, etc., although generally computational applications use more fine-grained POS tags like ‘noun-plural’.

Example:-

Query

Show me the Symptoms of Cancer

Tagging

Show/VB    me/PRP    the/DT    Symptoms/NNPS    of/IN    Cancer/NNP

Alphabetical list of part-of-speech tags used :

<td>
  <div align="left">
    Tag
  </div>
</td>

<td>
  <div align="left">
    Description
  </div>
</td>
<td>
  CC
</td>

<td>
  Coordinating conjunction
</td>
<td>
  CD
</td>

<td>
  Cardinal number
</td>
<td>
  DT
</td>

<td>
  Determiner
</td>
<td>
  EX
</td>

<td>
  Existential <i>there</i>
</td>
<td>
  FW
</td>

<td>
  Foreign word
</td>
<td>
  IN
</td>

<td>
  Preposition or subordinating conjunction
</td>
<td>
  JJ
</td>

<td>
  Adjective
</td>
<td>
  JJR
</td>

<td>
  Adjective, comparative
</td>
<td>
  JJS
</td>

<td>
  Adjective, superlative
</td>
<td>
  LS
</td>

<td>
  List item marker
</td>
<td>
  MD
</td>

<td>
  Modal
</td>
<td>
  NN
</td>

<td>
  Noun, singular or mass
</td>
<td>
  NNS
</td>

<td>
  Noun, plural
</td>
<td>
  NNP
</td>

<td>
  Proper noun, singular
</td>
<td>
  NNPS
</td>

<td>
  Proper noun, plural
</td>
<td>
  PDT
</td>

<td>
  Predeterminer
</td>
<td>
  POS
</td>

<td>
  Possessive ending
</td>
<td>
  PRP
</td>

<td>
  Personal pronoun
</td>
<td>
  PRP$
</td>

<td>
  Possessive pronoun
</td>
<td>
  RB
</td>

<td>
  Adverb
</td>
<td>
  RBR
</td>

<td>
  Adverb, comparative
</td>
<td>
  RBS
</td>

<td>
  Adverb, superlative
</td>
<td>
  RP
</td>

<td>
  Particle
</td>
<td>
  SYM
</td>

<td>
  Symbol
</td>
<td>
  TO
</td>

<td>
  <i>to</i>
</td>
<td>
  UH
</td>

<td>
  Interjection
</td>
<td>
  VB
</td>

<td>
  Verb, base form
</td>
<td>
  VBD
</td>

<td>
  Verb, past tense
</td>
<td>
  VBG
</td>

<td>
  Verb, gerund or present participle
</td>
<td>
  VBN
</td>

<td>
  Verb, past participle
</td>
<td>
  VBP
</td>

<td>
  Verb, non-3rd person singular present
</td>
<td>
  VBZ
</td>

<td>
  Verb, 3rd person singular present
</td>
<td>
  WDT
</td>

<td>
  Wh-determiner
</td>
<td>
  WP
</td>

<td>
  Wh-pronoun
</td>
<td>
  WP$
</td>

<td>
  Possessive wh-pronoun
</td>
<td>
  WRB
</td>

<td>
  Wh-adverb
</td>
Number
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.

We are categorizing/parsing these queries according to different grammatical phrases which in turn is helping us to easily recognize keywords that we are using to search from database. These keywords are then compared to the lexicons of the database which include the relation names, attributes and values available in the database.  The wh-word (query word) defines an operation in an SQL query such as Select and a noun of any form defines a keyword to be searched in the database. The rest of the tokens are used to describe a relation between them defining what the user is actually looking for.